Abstract: Processing very large amounts of data with the traditional conventional database systems are no longer able to handle such a data in an effective manner and practically now obsolete. Due to the introduction of new automated systems and Internet of Things (IoT), accumulation of massive size of the data through heterogeneous sources with unstructured or semi-structured form is quite obvious. To cope up with the current scenario of big data processing needs, Hadoop MapReduce is being the widely preferred choice among many organizations. With the recent growth of cloud computing paradigm, on-demand distributed and parallel data intensive processing is much cheaper and easier on the cloud. The objective of this research paper is to measure the execution time on different sizes of text files by performing a simple MapReduce simulation on the word count program which is very popular in the Big data and Text mining arena. Also, an improved version of the word count program has been designed and variants of word count programs have been tested and simulated on Amazon EC2 Cloud environment. A comparative study of both methods has been carried out and critically reviewed.
Keywords: Big Data Analytics, Hadoop, Hadoop Distributed File System (HDFS), MapReduce, Parallel and distributed Processing , Amazon EC2, Cloud Computing.